The human ionomics data set has been pre-processed. We need to get the symbolic data:
dat <- read.table("./test-data/human.csv", header = T, sep = ",")
dat <- dat[!duplicated(dat[, 1]), ]
colnames(dat)[1] <- "Line"
dat_symb <- symbol_data(x = dat, thres_symb = 3)
Some of ionomics data and symbolic data are like:
dat %>% sample_n(10) %>%
kable(caption = 'Ionomics data', digits = 2, booktabs = T) %>%
kable_styling(full_width = F, font_size = 10,
latex_options = c("striped", "scale_down"))
| Line | As | B | Ca | Cd | Co | Cu | Fe | K | Li | Mg | Mn | Mo | Na | Ni | P | S | Se | Zn |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SIRT1 | 0.62 | -1.65 | -1.77 | -1.41 | -1.99 | 0.60 | -0.82 | -3.86 | 1.17 | 3.23 | -1.96 | -0.40 | -1.74 | -2.18 | 2.08 | -1.11 | -0.96 | 1.12 |
| JAK1 | -2.63 | 1.75 | -0.86 | 1.24 | -0.56 | -0.11 | 2.93 | 0.82 | 1.57 | 1.79 | 0.91 | 1.23 | 1.10 | 2.19 | 0.65 | 0.43 | -1.61 | 1.50 |
| TUSC3 | 1.33 | -2.22 | -1.63 | -0.49 | 0.49 | -2.26 | 2.65 | 4.77 | 3.04 | 4.88 | -1.43 | 3.42 | -2.40 | -0.79 | 1.53 | 1.78 | -1.00 | 2.81 |
| HRSP12 | 2.27 | 1.50 | -1.43 | 0.91 | -0.82 | -0.42 | -1.14 | -0.52 | 1.66 | 1.11 | -0.89 | 0.10 | -0.59 | -0.30 | 2.53 | 1.52 | -1.39 | 1.60 |
| MRPL50 | -0.02 | 1.12 | 1.27 | 3.79 | 2.25 | 1.89 | 2.61 | 0.50 | -1.09 | 1.36 | 1.14 | 0.87 | 1.27 | 1.31 | 0.13 | 0.84 | -1.42 | 0.78 |
| AEBP2 | -0.02 | -0.12 | -3.82 | -0.84 | -1.83 | -0.78 | -0.82 | -2.55 | 1.59 | -2.13 | 0.69 | -3.49 | -1.03 | 0.70 | -1.20 | -0.77 | -0.18 | 0.03 |
| PIN1L | -0.02 | -0.81 | -1.50 | -0.75 | -0.87 | 2.45 | -2.16 | -1.24 | 0.12 | -1.53 | 3.54 | -0.40 | -1.35 | -0.70 | -2.58 | -0.80 | -0.52 | -2.26 |
| WDR17 | 0.86 | -2.46 | -0.17 | -3.60 | -1.00 | -1.45 | 1.40 | 1.89 | -2.44 | 2.69 | -0.32 | -0.06 | -1.06 | -0.16 | 1.29 | -0.14 | -1.24 | 1.10 |
| SQLE | -1.24 | -0.68 | -1.20 | -0.42 | -1.30 | 6.73 | 1.75 | -1.99 | -0.35 | 0.72 | -3.74 | -0.81 | -0.81 | 0.65 | 1.56 | -1.25 | 0.43 | 4.31 |
| EIF2B2 | -0.13 | -2.80 | -2.47 | -0.48 | -1.04 | 1.23 | -1.74 | -1.07 | -0.59 | -2.24 | -1.76 | -1.01 | -2.63 | 1.60 | 1.57 | -1.77 | 1.11 | -0.44 |
dat_symb %>% sample_n(10) %>%
kable(caption = 'Symbolic data', booktabs = T) %>%
kable_styling(full_width = F, font_size = 10,
latex_options = c("striped", "scale_down"))
| Line | As | B | Ca | Cd | Co | Cu | Fe | K | Li | Mg | Mn | Mo | Na | Ni | P | S | Se | Zn |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ZNF197 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| UBE2B | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 |
| ZDHHC21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| CDK5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | -1 |
| ALG10 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | -1 | 0 | 0 | 0 | 0 |
| SRPR | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| WDR17 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ZNF192 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |
| KARS | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| PGLYRP2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
These data are filtered, i.e. remove all zero genes in symbolic data set:
idx <- rowSums(abs(dat_symb[, -1])) > 0
dat <- dat[idx, ]
dat_symb <- dat_symb[idx, ]
dim(dat)
#> [1] 434 19
The hierarchical cluster analysis is the key part of gene network and gene enrichment analysis. The methodology is as follow:
One example is:
min <- 8
clust <- gene_clus(dat_symb[, -1], min_clust_size = min)
names(clust)
#> [1] "clus" "idx" "tab" "tab_sub"
clust$tab_sub
#> cluster nGenes
#> 1 14 11
#> 2 4 10
#> 3 24 10
#> 4 79 10
The gene network uses both the ionomics and symbolic data. The similarity measures on ionomics data are used to construct the network. Before creating a network, these analyses are further filtered by:
The methods implemented are: pearson, spearman, kendall, cosine, mahal_cosine or hybrid_mahal_cosine.
We use the Pearson correlation as similarity measure for network analysis:
net <- GeneNetwork(data = dat,
data_symb = dat_symb,
min_clust_size = min,
thres_corr = 0.6,
method_corr = "pearson")
The network with nodes coloured by the symbolic data clustering is:
net$plot.pnet1
Figure 1: Network with Pearson correlation: symbolic clustering
The same network, but nodes are coloured by the network community detection:
net$plot.pnet2
Figure 2: Network with Pearson correlation: community detction
The network analysis also returns a network impact and betweenness plot:
net$plot.impact_betweenness
Figure 3: Network with Pearson correlation: impact and betweenness
For comparison purposes, we use Mahalanobis Cosine:
net_2 <- GeneNetwork(data = dat,
data_symb = dat_symb,
min_clust_size = min,
thres_corr = 0.6,
method_corr = "mahal_cosine")
net_2$plot.pnet1
Figure 4: Network with Mahalanobis Cosine
net_2$plot.pnet2
Figure 5: Network with Mahalanobis Cosine
Again, we use Hybrid Mahalanobis Cosine:
net_3 <- GeneNetwork(data = dat,
data_symb = dat_symb,
min_clust_size = min,
thres_corr = 0.6,
method_corr = "hybrid_mahal_cosine")
net_3$plot.pnet1
Figure 6: Network with Hybrid Mahalanobis Cosine
net_3$plot.pnet2
Figure 7: Network with Hybrid Mahalanobis Cosine
The enrichment analysis is based on symbolic data clustering. The genes in clusters are considered target gene sets while genes in the whole data set is the universal gene set.
The KEGG enrichment analysis with a p-values of 0.05:
kegg <- kegg_enrich(data = dat_symb, min_clust_size = min, pval = 0.05,
annot_pkg = "org.Hs.eg.db")
#' kegg
kegg %>%
kable(caption = 'KEGG enrichment analysis',
digits = 3, booktabs = T) %>%
kable_styling(full_width = F, font_size = 10,
latex_options = c("striped", "scale_down"))
| Cluster | KEGGID | Pvalue | Count | Size | Term |
|---|---|---|---|---|---|
| Cluster 24 (10 genes) | 00510 | 0.032 | 2 | 9 | N-Glycan biosynthesis |
| Cluster 79 (10 genes) | 00520 | 0.000 | 2 | 4 | Amino sugar and nucleotide sugar metabolism |
Note that there could be no results returned for KEGG enrichment analysis.
Arguments such as min_clust_size can be changed as appropriate.
The GO Terms enrichment analysis with ontology of BP (other two are MF and CC):
go <- go_enrich(data = dat_symb, min_clust_size = min, pval = 0.05,
ont = "BP", annot_pkg = "org.Hs.eg.db")
#' go
go %>% head() %>%
kable(caption = 'GO Terms enrichment analysis',
digits = 3, booktabs = T) %>%
kable_styling(full_width = F, font_size = 10,
latex_options = c("striped", "scale_down"))
| Cluster | ID | Description | Pvalue | Count | CountUniverse | Ontology |
|---|---|---|---|---|---|---|
| Cluster 14 (11 genes) | GO:0009615 | response to virus | 0.0132 | 2 | 8 | BP |
| Cluster 14 (11 genes) | GO:0007059 | chromosome segregation | 0.025 | 2 | 11 | BP |
| Cluster 4 (10 genes) | GO:0051092 | positive regulation of NF-kappaB transcription factor activity | 0.0023 | 2 | 3 | BP |
| Cluster 4 (10 genes) | GO:0043410 | positive regulation of MAPK cascade | 0.0197 | 2 | 8 | BP |
| Cluster 4 (10 genes) | GO:0051090 | regulation of DNA-binding transcription factor activity | 0.0197 | 2 | 8 | BP |
| Cluster 4 (10 genes) | GO:0006955 | immune response | 0.0327 | 3 | 27 | BP |
The explanatory analysis performs PCA and correlation analysis for ions in terms of genes. Note that this analysis treats ions as samples/replicates while genes are treated as variables/features. The explanatory analysis is initially employed at an early stage of the analysis.
We apply it to the pre-processed data dat before any other analysis:
expl <- ExploratoryAnalysis(data = dat)
names(expl)
#> [1] "plot.pca" "data.pca.load" "plot.corr" "plot.corr.heat"
#> [5] "plot.heat" "plot.net"
The PCA plot is:
expl$plot.pca
Figure 8: Ion PCA plot on pre-processed data
The Person correlation of ions are shown in correlation plot, heatmap and network plot:
expl$plot.corr
Figure 9: Ion correlation plots on pre-processed data
expl$plot.corr.heat
Figure 10: Ion correlation plots on pre-processed data
expl$plot.net
Figure 11: Ion correlation plots on pre-processed data
The correlation between ions and genes are shown in heatmap with dendrogram:
expl$plot.heat
Figure 12: Correlation between ions and genes on pre-processed data